Collaborators

Include the names of your collaborators here.

Overview

This homework is dedicated to working with non-Bayesian regularization. You will fit your own ridge and lasso penalized regression models in order to gain experience with interpreting elastic net model results. You will also practice tuning a penalized regression model by finding the best regularization strength, \(\lambda\), value using a train/test split. You will then use the glmnet package and caret to tune elastic net models in order to identify the most important features in a model by turning off unimportant features.

This assignment will provide practice working with realistic modeling situations such as large numbers of inputs, examining the influence of correlated features, working with interactions, and working with categorical inputs.

If you do not have glmnet or caret installed please download both before starting the assignment.

You will also work with the corrplot package in this assignment. You must download and install corrplot before starting the assignment.

IMPORTANT: The RMarkdown assumes you have downloaded the data sets (CSV files) to the same directory you saved the template Rmarkdown file. If you do not have the CSV files in the correct location, the data will not be loaded correctly.

IMPORTANT!!!

Certain code chunks are created for you. Each code chunk has eval=FALSE set in the chunk options. You MUST change it to be eval=TRUE in order for the code chunks to be evaluated when rendering the document.

You are free to add more code chunks if you would like.

Load packages

This assignment will use packages from the tidyverse suite.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

This assignment also uses the broom package. The broom package is part of tidyverse and so you have it installed already. The broom package will be used via the :: operator later in the assignment and so you do not need to load it directly.

The glmnet package will be loaded later in the assignment. As stated previously, please download and install glmnet if you do not currently have it.

Problem 01

You will gain experience with non-Bayesian Ridge and Lasso regression by working with a large number of inputs in this problem. The training data are loaded for you below.

dfA_train <- readr::read_csv('hw09_probA_train.csv', col_names = TRUE)
## Rows: 96 Columns: 76
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (76): x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

All input variables have names that start with x and the output is named y. The glimpse provided below shows there are many input columns!

dfA_train %>% glimpse()
## Rows: 96
## Columns: 76
## $ x01 <dbl> -2.06943319, -1.29642012, 0.91302407, 0.53847200, 1.58258670, -2.3…
## $ x02 <dbl> -0.69220535, -0.74646173, 0.50415529, -2.24983365, -0.06869533, -1…
## $ x03 <dbl> -1.01314640, 0.46711663, 0.35073648, -0.26764563, -0.06524989, 0.8…
## $ x04 <dbl> 1.3310201, -1.1901216, 0.4567449, 1.9486926, 1.1343209, -0.5150867…
## $ x05 <dbl> 0.2662435, -0.2801269, 0.5267086, -1.2564140, 1.5565706, -0.237865…
## $ x06 <dbl> -0.72471702, -0.09158292, -0.70865959, 1.07960684, 0.72399642, -0.…
## $ x07 <dbl> 0.119003559, 1.794130638, -0.131307071, -0.634631730, 0.791042823,…
## $ x08 <dbl> -0.21240839, 1.43690253, 0.37810170, -0.95218224, 0.02586663, -0.9…
## $ x09 <dbl> 0.86530394, -0.13721049, -0.87982548, -1.81113831, -0.60337380, -1…
## $ x10 <dbl> -0.67479792, 0.49138297, -0.40283973, 1.21654838, 1.37487525, -2.9…
## $ x11 <dbl> 0.4768842, 1.3187331, -0.2375626, -0.8417482, -0.2555588, -0.73944…
## $ x12 <dbl> -0.55028261, -0.57696349, 0.34973240, -0.31482804, -1.32927922, 1.…
## $ x13 <dbl> 1.14678993, -0.03428652, -0.36594800, -0.19693086, 2.64028389, -0.…
## $ x14 <dbl> 0.62387485, -0.25243935, -0.23542049, -1.60157449, 1.17348036, -1.…
## $ x15 <dbl> -0.11340307, 0.62118450, -0.07521731, 0.22553162, -0.03979362, -0.…
## $ x16 <dbl> -1.04538727, -1.65054917, 1.35264110, -0.74933493, -0.20964781, -0…
## $ x17 <dbl> 0.47243145, 0.05101400, -0.61833554, -0.03849856, -0.47878135, 0.8…
## $ x18 <dbl> -0.590890777, 1.169765794, -0.059145998, -0.309537577, 0.842366000…
## $ x19 <dbl> 0.53364049, 2.33425890, -0.47852405, 0.54132470, 0.60261169, -2.02…
## $ x20 <dbl> -0.05844756, -0.74187288, 0.95175354, -0.20784247, 1.13489628, -2.…
## $ x21 <dbl> 0.65525513, 1.68884588, -0.57038117, -1.80555409, 2.08148634, 0.53…
## $ x22 <dbl> -1.20221003, 1.74128942, -0.74659747, 0.87619987, -0.10779368, -0.…
## $ x23 <dbl> 0.87133442, -0.14821616, 2.14691210, -0.19088522, -0.29948310, 0.3…
## $ x24 <dbl> 0.87836580, 1.12460649, 0.22236027, 0.20390661, 0.72825984, -0.163…
## $ x25 <dbl> -0.64288824, -0.06167637, -0.34527940, 0.62755689, -0.29965224, -0…
## $ x26 <dbl> -0.43031692, -0.72294116, -0.02082838, -0.19986085, 0.94931216, -1…
## $ x27 <dbl> -0.549731535, 0.104640571, -0.005052764, -1.555809942, 0.800257477…
## $ x28 <dbl> -0.37636467, 1.05012853, 1.09066971, -1.73283373, -0.52552393, -0.…
## $ x29 <dbl> 1.29310734, 1.13690150, -0.32233373, -0.31890823, 0.40454183, -0.0…
## $ x30 <dbl> 1.60799472, -0.34339053, -1.94283061, -0.61647080, 0.74117929, 1.0…
## $ x31 <dbl> -1.8940459, -1.3444175, 0.4917738, 1.2478144, 0.7653535, 0.4159829…
## $ x32 <dbl> 1.69469353, 0.17893691, 0.08081505, -1.63310270, 0.75915432, 0.541…
## $ x33 <dbl> -1.55078952, -0.15141527, -0.44858391, 0.21312202, 0.63037564, 0.7…
## $ x34 <dbl> 2.07109783, 0.95276628, 1.32599060, 2.27539009, -0.12636968, 0.247…
## $ x35 <dbl> 1.82290334, -0.31907075, -1.16353613, 0.85407328, -1.28814754, 0.6…
## $ x36 <dbl> -0.7394679, -1.7322869, 1.0537616, 0.1366273, -0.4719353, 0.413739…
## $ x37 <dbl> 0.93508239, -0.79537446, -1.87254115, 0.11414416, 0.05009938, -0.1…
## $ x38 <dbl> -0.507117532, -0.001069243, -0.636333478, 0.419641050, -0.47657448…
## $ x39 <dbl> 0.22888128, 1.31815563, 0.42261732, 0.15486481, -0.94886750, 0.802…
## $ x40 <dbl> -1.53550785, 0.08945169, 0.10561738, 1.40392237, 3.16451628, -1.59…
## $ x41 <dbl> 0.54995962, -1.49673970, 0.45267944, 0.07628427, -0.55596341, -0.7…
## $ x42 <dbl> -1.8288926, 1.7451062, 0.8158462, -0.1390173, -0.8047000, -0.11506…
## $ x43 <dbl> 1.90648214, 0.67113809, -0.05362633, 0.39563373, 2.67730180, 0.922…
## $ x44 <dbl> 1.29447870, -1.68009812, -0.32558008, -1.59679394, -0.49488477, -0…
## $ x45 <dbl> 0.95338061, -0.99034220, -0.98995377, 0.26891339, 0.12166661, -0.9…
## $ x46 <dbl> -0.13483662, -0.50696860, -0.51738788, -2.06234042, -0.10680057, -…
## $ x47 <dbl> 2.43187944, 0.37847403, -0.06741761, 0.59418109, -0.22940658, -0.7…
## $ x48 <dbl> -0.9565450, 0.4803113, -0.3920700, -0.4585714, -1.9929885, 0.88366…
## $ x49 <dbl> 0.94853247, 0.26362181, -0.16464646, 0.11935923, -0.32447324, 0.04…
## $ x50 <dbl> 0.19018260, 1.13505566, -0.16972687, -1.50584636, -0.07841405, -1.…
## $ x51 <dbl> -0.6831468, 0.1100079, 0.7239210, 0.2311925, -0.7642252, 0.1093128…
## $ x52 <dbl> 0.65307722, -0.80046181, 0.86927368, 1.59861934, -0.47053886, -0.0…
## $ x53 <dbl> -0.95918072, 0.09582336, 0.05805066, -1.65059194, 0.52212944, -0.8…
## $ x54 <dbl> -0.41506279, 0.53415776, -1.08420393, 0.31151571, 1.11321863, 2.20…
## $ x55 <dbl> -0.9469963, -0.9124609, -1.4035349, -0.3556957, 0.7224917, -0.1489…
## $ x56 <dbl> 0.54883719, -1.91727346, 0.18654617, -0.75041670, -1.04649763, 0.3…
## $ x57 <dbl> -0.72422453, 1.43064413, -1.47964130, 0.57516981, 0.49719144, 0.58…
## $ x58 <dbl> -2.20950208, -0.23497601, 1.07557427, 0.24929374, 0.56432730, -0.2…
## $ x59 <dbl> -0.8485211, -0.5308077, 1.5559665, -1.6331253, -0.2573712, -0.2842…
## $ x60 <dbl> -2.52402189, 1.29349906, -0.20218647, 1.87527348, -0.87476342, -0.…
## $ x61 <dbl> -0.66892027, -1.72557732, 0.79925161, 1.80655230, 0.22016194, -2.3…
## $ x62 <dbl> -0.92960688, 0.04079371, 0.33876203, 0.72753901, -1.08300120, 0.71…
## $ x63 <dbl> -0.42021184, 2.03151546, -0.21235170, -0.54181759, 1.27431677, 0.1…
## $ x64 <dbl> -0.68218725, -0.13237431, 0.66637214, 0.63351491, 0.70331183, -1.2…
## $ x65 <dbl> -1.853949165, -0.006625952, -1.239688043, 1.724372575, -0.53278079…
## $ x66 <dbl> 0.758622374, -0.247157562, -1.170425464, 1.188988247, 0.423596811,…
## $ x67 <dbl> 1.506210427, 0.085497740, -1.698973185, 1.076071882, -0.459228447,…
## $ x68 <dbl> 0.32879336, 0.80760027, -2.87189688, 0.15125671, 0.24641242, 0.857…
## $ x69 <dbl> -0.22660315, -0.86527369, -0.37419920, 1.28663788, -0.17221500, -1…
## $ x70 <dbl> 0.15206087, 0.50449311, 1.24957901, -3.66360118, 0.46417993, -1.09…
## $ x71 <dbl> 1.10750748, -1.51821411, -1.04662914, 0.91478065, 0.31102705, -1.2…
## $ x72 <dbl> -0.46295847, 0.75343551, -1.12696912, -0.84281597, -1.22013393, 0.…
## $ x73 <dbl> -0.74889700, -1.29222023, -2.58930724, -1.74360767, -1.09158189, 0…
## $ x74 <dbl> 1.48771931, -0.56530125, -1.56187174, -1.10624410, -1.65851229, -0…
## $ x75 <dbl> -1.22958786, 0.57706123, 0.38631089, 1.37850476, -1.21528450, -2.7…
## $ y   <dbl> -5.033324, -8.874837, 1.720462, 16.535303, 22.683417, -3.642452, -…

The data are reshaped for you into long-format. The output y is preserved and all input columns are “stacked” on top of each other. The name column provides the input’s name and the value column is the value of the input. The input_id column is created for you. It stores the input ID as an integer instead of a string. The stringr package is part of tidyverse.

lfA_train <- dfA_train %>% 
  tibble::rowid_to_column() %>% 
  pivot_longer(starts_with('x')) %>% 
  mutate(input_id = as.numeric(stringr::str_extract(name, '\\d+')))

1a)

It is always important to explore data before modeling. You will use regularized regression models in this assignment. Regularized regression requires that all features have roughly the same scale in order for the regularization strength to have the same influence.

Use a boxplot to check if all inputs have roughly the same scale. Pipe the long-format data, lfA_train to ggplot() and map the input_id to the x aesthetic and the value to the y aesthetic. Add the geom_boxplot() layer and map the input_id to the group aesthetic.

How do the scales compare across the inputs?

HINT: Remember the importance of the aes() function!

SOLUTION

ggplot(lfA_train,mapping = aes(x = input_id, y = value)) +
  geom_boxplot(mapping = aes(group = input_id)) 

When we visually examine the boxplot and look at the range of values across the different inputs, the boxes in the plot have similar sizes, and the whiskers have similar lengths. So the scales are roughly the same.

1b)

Let’s now visualize the relationship between the output, y, and each input. Each input will be associated with a facet, but you will still need to show several figures because there are so many inputs in this problem.

Create a scatter plot to show the relationship between the output and each input. Use facets for each input.

In the first code chunk, pipe the long-format data to filter() and keep all rows where input_id is less than 26. Pipe the result to ggplot() and map value to the x aesthetic and map y to the y aesthetic. Add the appropriate geom to make the scatter plot and add a facet_wrap() layer with the facets “as a function” of name.

In the second code chunk, pipe the long-format data to filter() and keep all rows where input_id is between 26 and 50. Use the same aesthetics and facet structure as the first code chunk.

In the third code chunk, pipe the long-format data to filter() and keep all rows where input_id is greater than 50. Use the same aesthetics and facet structure as the first code chunk.

Set the alpha to 0.33 in all three scatter plots.

HINT: The between() function can help you here…

SOLUTION

### the first 25 inputs
lfA_train %>%
  filter(input_id < 26) %>%
  ggplot(aes(x = value, y = y)) +
  geom_point(alpha = 0.33) +
  facet_wrap(~ name, scales = "free")

### inputs between 26 and 50
lfA_train %>%
  filter(between(input_id , 26, 50)) %>%
  ggplot(aes(x = value, y = y)) +
  geom_point(alpha = 0.33) +
  facet_wrap(~ name, scales = "free")

### last 25 inputs
lfA_train %>%
  filter(input_id > 50) %>%
  ggplot(aes(x = value, y = y)) +
  geom_point(alpha = 0.33) +
  facet_wrap(~ name, scales = "free")

1c)

Based your previous figures, which inputs seem related to the output?

SOLUTION

What do you think?

In the scatter plot for input_id less than 26, the input X07 seems related to the output y, as there is a clear negative linear relationship between the two. In contrast, the scatter plot for input_id greater than 26 shows that most inputs have no clear relationship with the output, as the patterns are random and scattered.

1d)

Let’s fit a non-Bayesian linear model by maximizing the likelihood. You do not have program the model from scratch. Instead, you are allowed to use lm() to fit the model.

Fit a linear model by maximizing the likelihood using linear additive features for all inputs.

You must use the original wide-format data, dfA_train.

Assign the model to the modA object.

SOLUTION

modA <- lm( y ~ ., data = dfA_train )

1e)

Use the coefplot::coefplot() function to visualize the coefficient summaries for your linear additive features model.

SOLUTION

coefplot::coefplot(modA) 

1f)

There are other ways to examine the coefficient summaries besides the coefficient plot visualization. The broom package has multiple functions which can help interpret model behavior. The broom package organizes model results into easy to manipulate TIDY dataframes. You will use the broom::tidy() function to examine the coefficient summaries. Applying the broom::tidy() function to an lm() model object produces a dataframe. One row is one coefficient. The term column is the name of feature the coefficient is applied to and the other columns provide different statistics about the coefficient. You can use any other tidyverse function, such as select(), filter(), count(), and others to manipulate the broom::tidy() returned dataframe!

Non-Bayesians focus on the p-value (p.value column from broom::tidy()) to identify if the feature is statistically significant.

Use broom::tidy() to identify which inputs non-Bayesians will identify as being statistically significant.

HINT: The common convention for statistical significance is 0.05…

SOLUTION

Add your code chunk here.

broom::tidy(modA) %>% 
  filter(p.value < 0.05)
## # A tibble: 14 × 5
##    term  estimate std.error statistic   p.value
##    <chr>    <dbl>     <dbl>     <dbl>     <dbl>
##  1 x02      -3.76      1.49     -2.52 0.0203   
##  2 x03       7.65      2.01      3.81 0.00109  
##  3 x07     -11.3       2.01     -5.63 0.0000165
##  4 x08       4.57      1.78      2.56 0.0186   
##  5 x11       4.15      1.95      2.13 0.0460   
##  6 x14      -5.82      2.04     -2.85 0.00989  
##  7 x19      -3.26      1.52     -2.15 0.0443   
##  8 x32       4.00      1.87      2.14 0.0447   
##  9 x33      -3.84      1.53     -2.51 0.0207   
## 10 x43       3.20      1.52      2.11 0.0480   
## 11 x51      -5.35      2.16     -2.47 0.0225   
## 12 x66       6.00      2.16      2.78 0.0115   
## 13 x68      -3.98      1.79     -2.22 0.0381   
## 14 x69      -5.41      2.16     -2.51 0.0209

Problem 02

You have explored the data and fit the first un-regularized non-Bayesian model. You have examined the statistically significant features, but now you must use regularized regression to identify the important features. Regularized regression is an important tool for “turning off” unimportant features in models. Regularized regression is especially useful when there are many features, as with this application. Consider for example that you are working at a startup company that produces components for medical devices. Your company makes the components with Additive Manufacturing (3D printing) and are trying to maximize the longevity of the printed part. Additive Manufacturing involves many different features that control the performance of a component. The company has tracked variables associated with the supplied material, the operation of the printer, and measured features associated with the printed objects. The company identified 75 inputs they think impact a Key Performance Indicator (KPI). It is your job to identify the most important inputs that influence the output!

This may seem like a daunting task, but that is where regularized regression can help! In lecture we introduced non-Bayesian regularized regression by stating the regression coefficients, the \(\boldsymbol{\beta}'s\), are estimated by minimizing the following loss function:

\[ SSE + \lambda \times penalty \]

The \(penalty\) formulation depends on if we are using Ridge or Lasso regression. However, in this assignment we will work with a slightly different loss function. We will work with the loss function consistent with the glmnet package:

\[ \frac{1}{2} MSE + \lambda \times penalty \]

This does not change the concepts discussed in lecture. It simply changes the magnitude of the regularization strength parameter, \(\lambda\). As an aside, this demonstrates why I prefer the Bayesian interpretation of regularization. The regularization is more interpretable in the Bayesian setting. The regularization is the prior standard deviation which controls the allowable range on the coefficients. But that is a philosophical debate for another time!

You must program the regularized loss function yourself, rather than using glmnet to fit the regularized regression model! This will highlight how the regularization strength is applied and how it relates to the SSE (or MSE). This problem deals with programming the RIDGE penalty and comparing the resulting RIDGE estimates to the coefficient MLEs.

2a)

You will program the Ridge loss function in a manner consistent with the log-posterior functions from previous assignments. You must create an initial list of required information before you program the function. This will allow you to test your function to make sure you programmed it correctly.

Complete the code chunk below by filling in the fields to the list of required information. The $yobs field is the “regular” R vector for the output. The $design_matrix is the design matrix for the problem. The $lambda field is the regularization strength, \(\lambda\).

You must use the original wide format dfA_train to specify the output and create the design matrix. The design matrix must be made consistent with how we have organized design matrices in lecture and thus must include the “intercept column of 1s”. Your design matrix should use linear additive features for all inputs. Lastly, set the regularization strength to 1.

SOLUTION

check_info_A <- list(
  yobs = dfA_train$y,
  design_matrix = model.matrix( y ~ ., data = dfA_train),
  lambda =1
)

2b)

The function, loss_ridge(), is started for you in the code chunk below. The loss_ridge() function consists of two arguments. The first, betas, is the “regular” R vector of the regression coefficients and the second, my_info, is the list of required information. You must assume that my_info contains fields you specified in the check_info_A list in the previous problem.

Complete the code chunk below by calculating the mean trend, the model’s mean squared error (MSE) on the training set, the RIDGE penalty, lastly return the effective Ridge loss. The comments and variable names below state what you should calculate at each line.

To check if you programmed loss_ridge() correctly, test the function with a guess of -1.2 for all regression coefficients. If you programmed loss_ridge() correctly you should get a value of 287.3114. As another test, use 0.2 for all regression coefficients. If you programmed loss_ridge() correctly you should get a value of 111.2219.

SOLUTION

loss_ridge <- function(betas, my_info)
{
  # extract the design matrix
  X <- my_info$design_matrix
  
  # calculate linear predictor
  mu <- as.vector( X %*% as.matrix(betas) )
  
  # calculate MSE
  MSE <- mean( (my_info$yobs - mu)^2 )
  
  # calculate ridge penalty
  penalty <- my_info$lambda * sum( betas^2 )
  
  # return effective total loss
  value1=0.5 * MSE + penalty
  value1
}

Test loss_ridge() with -1.2 for all regression coefficients.

loss_ridge(rep(-1.2, ncol(check_info_A$design_matrix)), check_info_A)
## [1] 287.3114

Test loss_ridge() with 0.2 for all regression coefficients.

loss_ridge(rep(0.2, ncol(check_info_A$design_matrix)), check_info_A)
## [1] 111.2219

2c)

You will use the optim() function to manage the optimization and thus fitting of the Ridge regression model. In truth, Ridge regression has a closed form analytic expression for the \(\boldsymbol{\beta}\) estimates! The formula has a lot in common with the Bayesian posterior mean formula assuming a known \(\sigma\)! However, we will focus on executing and interpreting the results rather than the mathematical derivation in this assignment.

You must define a function, fit_regularized_regression(), to manage the optimization for you. This function will be general and not specific to Ridge regression. This way you can use this same function later in the assignment to fit the Lasso regression model. The fit_regularized_regression() function is started for you in the code chunk below and has three arguments. The first, lambda_use, is the assumed regularization strength parameter value. The second, loss_func, is a function handle for the loss function you are using to estimate the coefficients, and the third argument, my_info, is the list of required information.

The my_info argument is different from check_info_A you defined previously. The my_info argument in fit_regularized_regression() does NOT include $lambda. Instead, you must assign lambda_use to the $lambda field. The purpose of fit_regularized_regression() is to allow iterating over many different values for \(\lambda\).

The bulk of fit_regularized_regression() is calling optim() to execute the optimization. You must specify the arguments correctly to the optim() function. Please note that you are minimizing the loss and so you should not specify fnscale in the control argument to optim().

The last portion of fit_regularized_regression() is a book keeping step which organizes the coefficient estimates in a manner consistent with the broom::tidy() function.

Complete the code chunk below by specifying the initial guess as 0 for all regression coefficients, assigning lambda_use to the $lambda field, complete the arguments to the optim() call, and complete the book keeping correctly. The estimate column in the returned tibble is the optimized estimate for the coefficients. The term column in the returned tibble is the coefficient name. You do NOT need to return the Hessian matrix from optim() and you should specify the method is 'BFGS" and the maximum number of iterations is 5001.

HINT: How can you identify the feature names associated with each coefficient?

HINT: Which of the returned elements from optim() correspond to the optimized estimates?

SOLUTION

fit_regularized_regression <- function(lambda_use, loss_func, my_info)
{
  # create the initial guess of all zeros
  start_guess <- rep(0, ncol(my_info$design_matrix))
  
  # add the regularization strength to the list of required information
  my_info$lambda <- lambda_use
  
  fit <- optim(start_guess,
               loss_func,
               gr = NULL,
               my_info,
               method = "BFGS",
               control = list(maxit = 5001),
               hessian = FALSE)
  
  # return the regularized beta estimates
  tibble::tibble(
    term = colnames(my_info$design_matrix),
    estimate = fit$par
  ) %>% 
    mutate(lambda = lambda_use)
}

2d)

You need to specify a new list of required information that only contains the output and design matrix in order to test the fit_regularized_regression() function.

Complete the first code chunk below by filling in the fields to the list of required information. The $yobs field is the “regular” R vector for the output. The $design_matrix is the design matrix for the problem. You must use the original wide format dfA_train to specify the output and create the design matrix and you use should linear additive features for all inputs.

Complete the second code chunk below by fitting the Ridge regularized model with a low regularization strength value of 0.0001. Assign the result to the check_ridge_fit_lowpenalty object. Display the glimpse() of check_ridge_fit_lowpenalty to the screen.

SOLUTION

reg_info <- list(
  yobs = dfA_train$y,
  design_matrix = model.matrix( y ~ ., data = dfA_train )
)

Fit the Ridge model with a regularization strength of 0.0001.

check_ridge_fit_lowpenalty <- fit_regularized_regression(0.0001, loss_ridge, reg_info)
check_ridge_fit_lowpenalty 
## # A tibble: 76 × 3
##    term        estimate lambda
##    <chr>          <dbl>  <dbl>
##  1 (Intercept)  -1.31   0.0001
##  2 x01           0.494  0.0001
##  3 x02          -3.74   0.0001
##  4 x03           7.62   0.0001
##  5 x04          -1.46   0.0001
##  6 x05          -0.0213 0.0001
##  7 x06          -1.85   0.0001
##  8 x07         -11.3    0.0001
##  9 x08           4.53   0.0001
## 10 x09          -2.02   0.0001
## # … with 66 more rows

2e)

The code chunk below is completed for you. It defines a function, viz_compare_to_mles(), which compares the regularized coefficient estimates to the MLEs. The first argument is a dataframe (tibble) returned from the fit_regularized_regression() function. The second argument is an lm() model object.

### create function to visualize the coefficient estimates
viz_compare_to_mles <- function(regularized_estimates, lm_mod)
{
  regularized_estimates %>% 
    left_join(broom::tidy(lm_mod) %>% 
                select(term, mle = estimate),
              by = 'term') %>% 
    pivot_longer(c("estimate", "mle")) %>% 
    filter(term != '(Intercept)') %>% 
    ggplot(mapping = aes(y = value, x = term)) +
    geom_hline(yintercept = 0, color = 'grey30', 
               size = 1.2, linetype = 'dashed') +
    stat_summary(geom = 'linerange',
                 fun.max = 'max', fun.min = 'min',
                 mapping = aes(group = term),
                 color = 'black', size = 1) +
    geom_point(mapping = aes(color = name,
                             shape = name)) +
    coord_flip() +
    scale_shape_discrete('', solid = FALSE) +
    scale_color_brewer('', palette = 'Set1') +
    labs(y = 'coefficient estimate', x = 'coefficient') +
    theme_bw()
}

Use the viz_compare_to_mles() function to compare the Ridge regularized coefficients with the low \(\lambda\) value to the coefficient MLEs.

How do they compare?

SOLUTION

viz_compare_to_mles(check_ridge_fit_lowpenalty, modA)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

How do the coefficients compare?
It can be seen from the figure that the blue and red icons basically overlap together, so we know that the two coefficients are very similar.

2f)

Let’s now try a high regularization strength of 10000 and see the impact on the coefficient estimates.

Use the fit_regularized_regression() to fit the Ridge regression model with a \(\lambda\) value of 10000. Assign the result to the check_ridge_fit_highpenalty object. Use the viz_compare_to_mles() function to compare the high penalized Ridge coefficient estimates to the MLEs.

How do the high penalized estimates compare the MLEs?

SOLUTION

check_ridge_fit_highpenalty <- fit_regularized_regression(10000, loss_ridge, reg_info)
viz_compare_to_mles(check_ridge_fit_highpenalty, modA)

How do they coefficients compare to the MLEs?
It can be seen from the figure that they have no similarity, and the red icon is close to 0 value.

2g)

The first argument to the fit_regularized_regression() function was intentionally set to lambda_use to allow iterating over a sequence of \(\lambda\) values. This allows us to try out dozens to hundreds of candidate \(\lambda\) values and see the regularization effect on the coefficient estimates.

However, you must first create a sequence of candidate \(\lambda\) values to iterate over!

Define a sequence of lambda values and assign the result to the lambda_grid variable in the first code chunk below. The grid should be 101 evenly spaced points in the natural log scale. The values in lambda_grid however must be in the original \(\lambda\) scale and thus NOT in the log-scale. The lower bound should correspond to \(\lambda = 0.0001\) and the upper bound should correspond to \(\lambda = 10000\).

The second code chunk executes the iteration for you with the purrr::map_dfr() function. The map_dfr() function applies the fit_regularized_regression() function to each unique value in the lambda_grid vector.

SOLUTION

lambda_grid <- exp( seq(log(0.0001), log(10000), length.out = 101) )

The eval flag is set to FALSE in the code chunk arguments below. Set eval=TRUE once you have properly defined lambda_grid.

Please note: The code chunk below may take a few minutes to complete.

ridge_path_results <- purrr::map_dfr(lambda_grid,
                                     fit_regularized_regression,
                                     loss_func = loss_ridge,
                                     my_info = reg_info)

2h)

It’s now time to visualize the coefficient path for the Ridge regression model! The code chunk below is started for you. The Intercept is removed to be consistent with the coefficient path visualizations created by glmnet package.

Complete the code chunk below by piping the dataframe to ggplot(). Map the log(lambda) to the x aesthetic and the estimate to the y aesthetic. Include a geom_line() layer and map the term variable to the group aesthetic within geom_line(). Set the alpha to 0.33.

SOLUTION

ridge_path_results %>% 
  filter(term != '(Intercept)') %>% 
  ggplot(mapping = aes(x = log(lambda), y = estimate)) +
  geom_line(mapping = aes(group = term),alpha = 0.33)

2i)

Describe the behavior of the coefficient paths as the regularization strength increases in the plot shown in 2h).

SOLUTION

What do you think?
As the regularization strength increases, the estimated coefficients tend to decrease and move closer towards 0. This decrease occurs gradually and follows a smooth path. At low regularization strengths, the estimated coefficients are similar to the MLE coefficients and have high values on the y-axis. However, as the regularization strength increases, the estimated coefficients decrease and approach 0, with some becoming negative.

The slope of each coefficient path is an indication of how much the coefficient value changes as the regularization strength changes. Inputs that are highly correlated with the output have steeper slopes than those with weaker correlation.

To summarize, the coefficient paths tend to smoothly decrease towards 0 as the regularization strength increases. Additionally, inputs with stronger correlations to the output have steeper slopes in their coefficient paths.

Problem 03

As discussed in lecture the Ridge penalty is analogous to applying independent Gaussian priors in Bayesian linear models! However, the regularization penalty can take other forms besides the Ridge penalty. The LASSO penalty behaves differently than Ridge. This problem is focused on applying the Lasso penalty and comparing it to the behavior of the Ridge penalty.

You will begin by defining a loss function similar to the loss_ridge() function you defined earlier. You will then use this new function, loss_lasso(), to fit the Lasso regression model.

3a)

The function, loss_lasso(), is started for you in the code chunk below. The loss_lasso() function has the same structure as the loss_ridge() function and thus has 2 arguments. The first, betas, is the “regular” R vector of the regression coefficients and the second, my_info, is the list of required information. You must assume that my_info contains the same fields as the my_info list used within loss_ridge().

Complete the code chunk below by calculating the mean trend, the model’s mean squared error (MSE) on the training set, the LASSO penalty, lastly return the effective Lasso loss. The comments and variable names below state what you should calculate at each line.

To check if you programmed loss_lasso() correctly, test the function with a guess of -1.2 for all regression coefficients and use the check_info_A list defined earlier. If you programmed loss_lasso() correctly you should get a value of 269.0714. As another test, use 0.2 for all regression coefficients and use the check_info_A list defined earlier. If you programmed loss_lasso() correctly you should get a value of 123.3819.

SOLUTION

loss_lasso <- function(betas, my_info)
{
  # extract the design matrix
  X <- my_info$design_matrix
  
  # calculate linear predictor
  mu <- as.vector( X %*% as.matrix(betas) )
  
  # calculate MSE
  MSE <- mean( (my_info$yobs - mu)^2 )
  
  # calculate LASSO penalty
  penalty <- my_info$lambda * sum( abs(betas) )
  
  # return effective total loss
  value2=0.5 * MSE + penalty
  value2
}

Test loss_lasso() with -1.2 for all regression coefficients.

loss_lasso(rep(-1.2, ncol(check_info_A$design_matrix)), check_info_A)
## [1] 269.0714

Test loss_lasso() with 0.2 for all regression coefficients.

loss_lasso(rep(0.2, ncol(check_info_A$design_matrix)), check_info_A)
## [1] 123.3819

3b)

You now have everything ready to fit the Lasso regression model! The fit_regularized_regression() function was defined to be general. You should use fit_regularized_regression() to fit the Lasso model by assigning loss_lasso to the loss_func argument. You must use the reg_info list as the my_info argument. The user supplied lambda_use value will then be used as the regularization strength applied to the Lasso penalty.

Fit the Lasso regression model with a low penalty of 0.0001 and assign the result to the check_lasso_fit_lowpenalty object.

After fitting, use the viz_compare_to_mles() function to compare the Lasso estimates to the MLEs.

How do the Lasso estimates compare to the MLEs?

SOLUTION

check_lasso_fit_lowpenalty <- fit_regularized_regression(0.0001, loss_lasso, reg_info)
viz_compare_to_mles(check_lasso_fit_lowpenalty, modA)

How do they compare?

It can be seen from the figure that the blue and red icons basically overlap together, so we know that the two coefficients are very similar.

3c)

Next, let’s examine how a strong Lasso penalty impacts the coefficients.

Fit the Lasso regression model again but this time with a high penalty of 10000 and assign the result to the check_lasso_fit_highpenalty object.

After fitting, use the viz_compare_to_mles() function to compare the Lasso estimates to the MLEs.

How do the Lasso estimates compare to the MLEs?

SOLUTION

check_lasso_fit_highpenalty <- fit_regularized_regression(10000, loss_lasso, reg_info)
viz_compare_to_mles(check_lasso_fit_highpenalty, modA)

It can be seen from the figure that they have no similarity, and the red icon is close to 0 value.

3d)

Previously, you defined a sequence of \(\lambda\) values and assigned those values to the lambda_grid object. This sequence serves the role as a candidate search grid. We do not know the best \(\lambda\) to use so we try out many values! The code chunk below is completed for you. The eval flag is set to FALSE and so you must set eval=TRUE in the code chunk options. The Lasso regression model is fit for each \(\lambda\) value in the search grid and the coefficient estimates are returned within a dataframe (tibble). The lasso_path_results dataframe includes the lambda value, just as the ridge_path_results object included the lambda value previously.

Please note: The code chunk below may take a few minutes to complete.

lasso_path_results <- purrr::map_dfr(lambda_grid,
                                     fit_regularized_regression,
                                     loss_func = loss_lasso,
                                     my_info = reg_info)

You will use the lasso_path_results to visualize the Lasso coefficient path. As with the Ridge coefficient path from earlier in the assignment, the Intercept is removed for you to be consistent with the glmnet path visualizations.

Complete the code chunk below by piping the dataframe to ggplot(). Map the log(lambda) to the x aesthetic and the estimate to the y aesthetic. Include a geom_line() layer and map the term variable to the group aesthetic within geom_line(). Set the alpha to 0.33.

How does the figure below compare to the figure created in Problem 2h)? How are the Lasso paths different from the Ridge paths?

SOLUTION

lasso_path_results %>% 
  filter(term != '(Intercept)') %>% 
  ggplot(mapping = aes(x = log(lambda), y = estimate)) +
  geom_line(mapping = aes(group = term),alpha = 0.33) 

What do you think?
Compared to the Ridge coefficient path plot created in Problem 2h, the Lasso coefficient path plot shows that the estimated coefficient values decrease more sharply towards 0.

Problem 04

Now that you have studied the behavior of the Ridge and Lasso penalties in greater depth, it’s time to tune the regularization strength. You will tune the regularization strength just for the Lasso model. The steps are the same for the Ridge model, but we will focus on Lasso in this question.

The code chunk below reads in another data set for you. This data set is a random test split from the same data the training set was created from. You will thus use this data set as the hold-out test set to assess or evaluate the Lasso model for each regularization strength in the search grid.

dfA_test <- readr::read_csv('hw09_probA_test.csv', col_names = TRUE)
## Rows: 24 Columns: 76
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (76): x01, x02, x03, x04, x05, x06, x07, x08, x09, x10, x11, x12, x13, x...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

A glimpse is provided below to show the variable names are the same as those in dfA_train.

dfA_test %>% glimpse()
## Rows: 24
## Columns: 76
## $ x01 <dbl> 1.14313808, 0.01816419, 0.49591261, -1.49232633, -0.62059649, -0.0…
## $ x02 <dbl> -0.6464637, -0.4662545, -0.7916580, -0.7770189, 1.7378213, -0.4956…
## $ x03 <dbl> -1.89562312, -1.83545340, 0.58783552, -0.56476710, -0.15024320, 0.…
## $ x04 <dbl> -1.815775806, 0.281280533, -0.106889840, 1.066147596, 0.484055779,…
## $ x05 <dbl> -0.7128627, 1.4877917, -1.3365942, 2.0733856, -1.3475529, 0.331148…
## $ x06 <dbl> 1.49499607, -0.58088658, -2.78972454, -0.57166274, 0.03457907, 1.7…
## $ x07 <dbl> -1.17828630, -0.07111160, -0.64334020, -0.24136133, 0.55998293, 0.…
## $ x08 <dbl> -0.319216142, -0.314247113, -0.006775336, 0.472946659, -0.99712812…
## $ x09 <dbl> -0.92278608, -0.78961832, 0.16124184, -0.37300982, 0.31941510, -0.…
## $ x10 <dbl> 0.43104894, 0.06647602, -0.38361750, -0.98903565, -0.84237893, -0.…
## $ x11 <dbl> -0.75607015, -0.83118449, -1.30590588, 0.65576000, -0.32057164, 0.…
## $ x12 <dbl> -1.61445673, 2.23094618, 0.00335797, 0.63447707, -0.83379618, -2.1…
## $ x13 <dbl> -1.04732333, 0.94678115, -1.17904809, 0.01401656, 1.10230968, 2.84…
## $ x14 <dbl> 1.24779778, 2.21320896, -0.35418771, -2.29432424, 0.32813203, -0.3…
## $ x15 <dbl> 1.152395009, 0.871427100, -0.809043870, 1.068301785, -0.319201329,…
## $ x16 <dbl> -0.68759127, 0.22346479, 0.27785233, 0.63106951, -1.20893271, -0.9…
## $ x17 <dbl> -1.32498321, 0.99321252, 0.54829226, -1.98481617, -0.68417370, 2.1…
## $ x18 <dbl> 0.57914293, 1.04769622, -1.77207000, 0.82656793, 0.06793236, -0.26…
## $ x19 <dbl> -2.7324805, 0.8359603, 0.2268546, 0.1402093, -0.7464301, -0.554660…
## $ x20 <dbl> -0.13225186, -0.11301496, 1.45685079, 1.08894467, 0.52618785, 0.48…
## $ x21 <dbl> -0.05511845, 0.67435934, -1.23584780, -0.52775922, -2.39672927, -0…
## $ x22 <dbl> 0.376887793, 1.143383468, 0.347587585, 0.366396727, 1.204298906, 1…
## $ x23 <dbl> -0.02315765, -0.17996268, -0.06702020, -0.96745663, 1.05621951, 0.…
## $ x24 <dbl> -0.79364619, -0.75141178, -1.44564412, -0.26952336, -0.07881057, -…
## $ x25 <dbl> -1.1762732, 0.2858181, -1.9007820, 1.7190158, -2.1020462, 0.521933…
## $ x26 <dbl> 0.03440645, -0.16521310, 1.37251771, 0.15838025, 0.36806501, 0.481…
## $ x27 <dbl> 0.3555787, 0.7234963, -0.0119092, 0.5693598, 0.6075264, 0.9278386,…
## $ x28 <dbl> 0.46563934, 0.63759021, 0.69499926, -1.22618311, 0.45442219, 0.330…
## $ x29 <dbl> -0.30017643, -1.03137258, -0.76110963, 1.01467089, 1.30270574, -0.…
## $ x30 <dbl> 0.45063846, -0.27610195, 0.02663231, 0.23689472, -0.49330443, 0.29…
## $ x31 <dbl> 0.37249149, 0.99151848, -1.45214015, -1.20993730, 0.95817968, 0.47…
## $ x32 <dbl> 0.160626608, 0.298804786, 1.392771898, 2.397480316, -0.350553982, …
## $ x33 <dbl> 0.6959056, -0.5876913, 1.0865048, -0.2956193, 1.1790686, 0.2194538…
## $ x34 <dbl> -1.286318824, -0.573059103, 0.709131820, 0.603993658, -0.009135101…
## $ x35 <dbl> -0.029824439, 0.120974022, -0.048754275, 1.998250597, -0.285683304…
## $ x36 <dbl> 1.74619464, -1.15938972, 0.52561374, -1.41721149, 0.45313713, -0.5…
## $ x37 <dbl> -1.6102714, 1.2100254, -0.8188211, 0.4097319, 0.3564670, 1.4065122…
## $ x38 <dbl> -0.52626336, -0.13884224, -0.97867868, 1.97088576, 0.99121364, 0.9…
## $ x39 <dbl> 0.47424567, 0.16380496, 0.09808709, -0.15455438, 0.38395056, 0.215…
## $ x40 <dbl> 1.04905686, -1.12755696, 0.81950481, -0.62597926, 0.18891435, 0.27…
## $ x41 <dbl> 0.9802753, 1.0727297, -0.1034688, -0.7738019, 1.4521328, 0.5824250…
## $ x42 <dbl> 0.8026953, 0.3912334, 0.2311717, 0.4416862, 0.6058927, 1.3619968, …
## $ x43 <dbl> 1.497014705, 0.199461083, 0.877037014, 1.275111836, -1.463546089, …
## $ x44 <dbl> 0.88361734, -0.43124514, 0.76909353, 0.34181375, 0.84916573, 0.527…
## $ x45 <dbl> -0.08048870, -0.87731052, -1.23497116, 1.45209109, 0.17710336, -0.…
## $ x46 <dbl> -0.22229459, 1.39743346, 1.46384916, 0.03334550, 0.73187037, 0.790…
## $ x47 <dbl> -1.7476152825, -0.2879971901, 1.7775635629, -0.8658681201, -0.1762…
## $ x48 <dbl> 1.07512535, -1.42892804, 1.14830168, -1.15034090, 1.14859354, -0.8…
## $ x49 <dbl> 0.11297505, -0.33298760, 2.62739250, -0.06144636, -0.62081207, -0.…
## $ x50 <dbl> -1.760714e+00, 1.546895e-01, -1.336308e+00, -5.341918e-01, -1.3781…
## $ x51 <dbl> 0.355802760, 1.261605721, -1.007393498, -0.229887771, -0.494385689…
## $ x52 <dbl> 1.12843542, 0.18742887, -0.15268669, -1.69652901, 1.00424500, -0.3…
## $ x53 <dbl> 0.2612449, 0.4780664, -0.2110005, -0.5120048, -1.3017326, 0.713045…
## $ x54 <dbl> -0.38075761, 0.84135339, 0.19037086, -0.80044132, -0.02384702, 0.1…
## $ x55 <dbl> 2.491695857, 0.463015446, 0.592801784, -0.513619781, 1.192908697, …
## $ x56 <dbl> 0.6072242, -0.6014201, -0.8298660, 1.6731066, -1.1881461, 0.800793…
## $ x57 <dbl> 0.562299802, 1.513023999, -0.919529890, -0.637432106, -1.079789149…
## $ x58 <dbl> -1.90164605, 1.32513196, -1.32907538, -0.06284810, 0.57858234, -0.…
## $ x59 <dbl> -0.678017763, 0.671531331, 0.373627202, -1.360163880, -0.798847302…
## $ x60 <dbl> -1.01817726, 0.05251363, -0.29394473, 0.89162065, 0.84220139, -0.7…
## $ x61 <dbl> -0.309767250, -1.295571666, 0.476181835, 0.987615256, -0.421788553…
## $ x62 <dbl> 0.94130874, 1.49237811, 1.48027176, 1.49906080, -1.18024281, 0.061…
## $ x63 <dbl> 1.117138820, 1.783084869, -1.758350099, 0.321090650, 0.156595164, …
## $ x64 <dbl> 0.01336110, -2.61963800, 0.23684126, -1.75412865, 1.56489203, -1.1…
## $ x65 <dbl> 1.16813558, 1.27909451, -0.23801311, -1.55946227, 0.47303683, 0.84…
## $ x66 <dbl> -0.68179048, -0.62455405, -1.04955883, 0.08481691, 0.13387617, 2.8…
## $ x67 <dbl> -0.107278267, 0.014839010, 0.584834981, -2.277171121, 2.358414125,…
## $ x68 <dbl> -0.34835071, -0.01205553, 0.15787720, -1.72791685, -0.23807589, -0…
## $ x69 <dbl> 0.891925590, 1.195514481, 0.339961334, -0.246669111, -1.672261892,…
## $ x70 <dbl> 0.81289142, 0.74375772, -0.62796234, -2.20482715, -1.21972263, 1.4…
## $ x71 <dbl> 2.27572777, 1.19452221, -0.40839006, -0.60256595, -0.83067145, 1.3…
## $ x72 <dbl> -0.69807238, 0.31327920, -0.15541045, -2.06316223, -1.24371309, 0.…
## $ x73 <dbl> 0.17105096, -0.24375675, -1.56184416, -0.09647939, 0.38517769, 0.8…
## $ x74 <dbl> 0.40569893, -0.26919348, -1.22765740, -0.78351810, -0.51345731, -1…
## $ x75 <dbl> -0.15896868, -0.73370485, -0.40814114, 0.41869379, 0.52360174, 0.6…
## $ y   <dbl> -19.9840444, -21.5075154, 11.6157884, -4.8924499, -17.1137287, 11.…

You will execute fitting the Lasso model on the training set and then testing the Lasso model on the hold-out test set. You will “score” the model by calculating the hold-out test MSE for each \(\lambda\) value in the search grid. You already fit the Lasso model on the training set, dfA_train, for each value in lambda_grid. However, you will create a function, train_and_test_regularized(), which executes both the training and testing actions. Thus, your function, train_and_test_regularized(), will do everything required to evaluate the model performance. Your function is therefore analogous to the cv.glmnet() function, the caret::train() function, and the training and testing functions within tidymodels!

4a)

The train_and_test_regularized() function is started for you in the code chunk below. It consists of 4 arguments. The first, lambda_use, is the user specified regularization strength. The second, train_data, is the training data set. The third, test_data, is the hold-out test data set. The last argument, loss_func, is the loss function used to fit the model. Notice that you do not provide the lists of required information to train_and_test_regularized(). Instead, the data are provided and train_and_test_regularized() defines the list of required information!

It is important to note that the we need to standardize the training data before fitting regularized models. The testing data must then be standardized based on the training data. The dfA_test data provided to you has already been appropriately pre-processed and so you do not need to worry about that here. This lets you focus on training and assessing the model performance.

Complete the code chunk below to define the train_and_test_regularized() function. The comments and variable names state the actions you should perform within each line of code.

Remember that we are working with linear additive features for all inputs.

HINT: Remember that you defined a function to allow fitting the model for a given \(\lambda\) value previously…

SOLUTION

train_and_test_regularized <- function(lambda_use, train_data, test_data, loss_func)
{
  # make design matrix on TRAINING set
  Xtrain <-  model.matrix( y ~ ., data = train_data )
  
  # make list of required information for the TRAINING set
  info_train <- list(
    yobs = train_data$y,
    design_matrix = Xtrain
  )
  
  # train the model
  train_results <- fit_regularized_regression(lambda_use, loss_func, info_train)
  
  # extract the training set coefficient estimates (completed for you)
  beta_estimates <- train_results %>% pull(estimate)
  
  # make the design matrix on the TEST set
  Xtest <- model.matrix( y ~ ., data = test_data )
  
  # predict the trend on the TEST data
  mu_test <- as.vector( Xtest %*% as.matrix( beta_estimates) )
  
  # calculate the MSE on the TEST set
  MSE_test <- mean( (test_data$y - mu_test)^2 )
  
  # book keeping (completed for you)
  list(lambda = lambda_use,
       MSE_test = MSE_test)
}

4b)

The code chunk below is completed for you. It applies the train_and_test_regularized() function to each \(\lambda\) value in the lambda_grid search grid. The loss_lasso function is assigned to the loss_func argument and thus the Lasso model is trained and tested for each regularization strength candidate value. The eval flag is set to FALSE and so you must set eval=TRUE in the code chunk options.

Please note: The code chunk below may take a few minutes to complete.

train_test_lasso_results <- purrr::map_dfr(lambda_grid,
                                           train_and_test_regularized,
                                           train_data = dfA_train,
                                           test_data = dfA_test,
                                           loss_func = loss_lasso)

A glimpse of the train/test assessment results is shown to the screen below. The result is a dataframe (tibble) with two columns. The lambda column is the regularization strength and MSE_test is the MSE on the test set.

train_test_lasso_results %>% glimpse()
## Rows: 101
## Columns: 2
## $ lambda   <dbl> 0.0001000000, 0.0001202264, 0.0001445440, 0.0001737801, 0.000…
## $ MSE_test <dbl> 788.0273, 787.6335, 787.1600, 786.6299, 786.0216, 785.3100, 7…

You must visualize the test set performance by plotting the hold-out test MSE with respect to the log of the regularization strength.

Create a line plot in ggplot2 to visualize the hold-out test MSE vs the log of the regularization strength.

Which \(\lambda\) value yields the best test set performance?

SOLUTION

ggplot(train_test_lasso_results,mapping = aes(x = log(lambda), y = MSE_test)) +
  geom_line()

Which \(\lambda\) is the best?

The \(\lambda\) value that corresponds to the lowest MSE on the test set is typically considered to be the optimal value which is near to 1.

4c)

You must now compare the Lasso estimates associated with the best \(\lambda\) value to the MLEs.

HINT: Remember that you already calculated the Lasso estimates for all \(\lambda\) values within the lasso_path_results object. You just need to filter that object for the appropriate lambda value to identify the best Lasso estimates.

SOLUTION

best_Lasso_estimates <-filter(train_test_lasso_results,MSE_test == min(MSE_test)) %>% pull(lambda)
viz_compare_to_mles(lasso_path_results %>% filter(lambda == best_Lasso_estimates), modA)

### 4d)

You just used a train/test split to validate and tune a machine learning model! However, single train/test splits do not allow us to estimate the reliability of the performance. We are not able to state how confident we are in the finding the best \(\lambda\) value in this case. Resampling methods, such as cross-validation, provide more robust and stable results compared to single train/test splits.

However, you will not execute the Resampling manually. You will use the cv.glmnet() function from the glmnet library to manage cross-validation and “scoring” for you.

The glmnet library is loaded for you in the code chunk below.

library(glmnet)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loaded glmnet 4.1-7

The dfA_train and dfA_test splits were randomly created by splitting a larger data set with an 80/20 train/test split. The code chunk below binds the rows together, recreating the original data set. This allows cv.glmnet() to manage splitting the original 120 observations rather than the smaller randomly created training set.

dfA <- dfA_train %>% bind_rows(dfA_test)

Let’s start out by fitting the Lasso model directly with glmnet before executing the cross-validation. This way we can compare our previous Lasso path to the glmnet path!

You need to create the design matrix consistent with the glmnet requirements before fitting the glmnet model. You must therefore create a new design matrix for the linear additive features for all inputs but remove the intercept column from the design matrix.

Complete the code chunk below by defining the glmnet required design matrix, extracting the response vector, and then fitting the Lasso model with glmnet.

Assign the lambda_grid object you created previously to the lambda argument within glmnet(). You are therefore using a custom lambda grid rather than the default set of lambda values.

SOLUTION

Xenet <- model.matrix( y ~ . - 1, data = dfA)

yenet <- dfA$y

lasso_glmnet <- glmnet(Xenet, yenet, lambda = lambda_grid)

4e)

Use the glmnet default plot method to visualize the coefficient path for the glmnet trained Lasso model. You must assign the xvar argument in the plot function call to 'lambda' to show the path consistent with your previous figures.

How does the glmnet created coefficient path compare to Lasso coefficient path you created in Problem 3d)?

SOLUTION

plot( lasso_glmnet, xvar = 'lambda' )

How do they compare?
The glmnet created coefficient path is very similar to the Lasso coefficient path we created in problem 3d. Both paths show that as the penalty parameter lambda increases, more and more coefficients are shrunk towards zero. The glmnet path has a smoother appearance because it uses a different algorithm to fit the model and produces a larger number of penalty parameter values than we used in problem 3d.

4f)

Now it’s time to tune the Lasso model with cross-validation. You will specifically use 5 fold cross-validation and you must use the same \(\lambda\) search grid defined previously within lambda_grid.

Use the cv.glmnet() function to tune the Lasso model with 5-fold cross-validation and the defined search grid. Assign the lambda_grid object to the lambda argument within cv.glmnet(). Assign the result to the lasso_cv object.

Plot the cross-validation results to the screen.

How many non-zero features does the best Lasso model have? How many non-zero features does the Lasso model recommended by the 1-standard error rule have?

NOTE: The random seed is set for you below to force the splits to be the same every time you run the code chunk.

SOLUTION

set.seed(71231)
lasso_cv <- cv.glmnet(Xenet, yenet, folds = 5, lambda = lambda_grid)
plot(lasso_cv)

There are around 33 non-zero features in the best Lasso model. There are also around 33 non-zero features in the Lasso model recommended by the 1-standard error rule. ### 4g)

Which inputs matter based on the tuned Lasso model?

SOLUTION

Add your code chunks here and what do you think?

coef( lasso_cv )
## 76 x 1 sparse Matrix of class "dgCMatrix"
##                       s1
## (Intercept) -0.007617687
## x01          .          
## x02          .          
## x03          5.215631713
## x04          0.220175722
## x05          .          
## x06         -0.300914143
## x07         -5.459198976
## x08          .          
## x09          .          
## x10          0.284394135
## x11          .          
## x12          .          
## x13          .          
## x14         -0.461203315
## x15          .          
## x16          .          
## x17          .          
## x18         -0.068095578
## x19          .          
## x20          .          
## x21          2.495201526
## x22          .          
## x23          .          
## x24          .          
## x25         -0.166388039
## x26          .          
## x27          .          
## x28          .          
## x29          .          
## x30          .          
## x31          .          
## x32          .          
## x33         -2.225614397
## x34         -0.163791963
## x35          .          
## x36          .          
## x37          .          
## x38          .          
## x39         -0.225130678
## x40          .          
## x41          .          
## x42          .          
## x43          0.656559741
## x44         -0.728392015
## x45          .          
## x46          .          
## x47          .          
## x48          .          
## x49          0.308977038
## x50          .          
## x51         -0.414803117
## x52          .          
## x53          .          
## x54          0.022807744
## x55          .          
## x56          .          
## x57          0.228032406
## x58          .          
## x59          .          
## x60          .          
## x61          0.615284264
## x62          .          
## x63          .          
## x64          .          
## x65         -0.442303305
## x66          0.622675666
## x67          .          
## x68          .          
## x69         -0.530980464
## x70          .          
## x71          0.547087051
## x72          .          
## x73          .          
## x74          .          
## x75          0.256955412

So we see that all parameter with values are meaningful.

Problem 5

Correlated features cause problems for many models. One approach to try and identify if the features correlation matters is through tuning the elastic net model. The elastic net blends Ridge and Lasso penalties and involves two tuning parameters. The regularization strength is the same parameter you worked with in the previous problems. The mixing fraction is the weight applied to blend the Ridge and Lasso penalties. Tuning the mixing fraction will help us identify if the feature correlation impacts the model behavior.

The code chunk below reads in a data set that will let you practice working with correlated features.

dfB <- readr::read_csv('hw09_probB.csv', col_names = TRUE)
## Rows: 110 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (10): x1, x2, x3, x4, x5, x6, x7, x8, x9, y
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The glimpse provided below shows there are 9 continuous inputs and 1 continuous response.

dfB %>% glimpse()
## Rows: 110
## Columns: 10
## $ x1 <dbl> 3.09869212, -0.27058881, 0.18559238, 0.16634118, -1.23293151, -1.36…
## $ x2 <dbl> 2.94214328, -1.17373978, -0.01076317, -0.65687092, -1.14843444, 0.1…
## $ x3 <dbl> 3.28268099, 0.15014082, -0.27188083, -0.75522974, -0.90504324, -0.0…
## $ x4 <dbl> 2.37049510, -0.55569220, -0.06411316, -1.20347199, -0.67747417, -0.…
## $ x5 <dbl> 2.95419609, -0.69823484, 0.23546653, -0.97386688, -1.44499383, -0.1…
## $ x6 <dbl> 3.5092130, -0.8562281, 0.6292396, -0.4149860, -1.5292260, 0.1067256…
## $ x7 <dbl> 1.58927120, -1.28057044, -0.21849387, -1.41114837, -0.11611693, 1.1…
## $ x8 <dbl> 2.9937227, -1.6100189, 0.5300147, -0.5169738, -0.3193423, -0.208050…
## $ x9 <dbl> 2.19334318, -0.58670454, 0.70500888, -0.25869602, -0.05056980, 0.13…
## $ y  <dbl> -20.84643348, -5.89326160, 0.08635077, -5.83994552, 5.77691663, 1.4…

The reshaped long-format data set is created for you in the code chunk below. The response y is not stacked and thus the long-format includes all inputs, x1 through x9 stacked together.

lfB <- dfB %>% 
  tibble::rowid_to_column() %>% 
  pivot_longer(starts_with('x'))

lfB %>% glimpse()
## Rows: 990
## Columns: 4
## $ rowid <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3…
## $ y     <dbl> -20.84643348, -20.84643348, -20.84643348, -20.84643348, -20.8464…
## $ name  <chr> "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x1", "x2"…
## $ value <dbl> 3.09869212, 2.94214328, 3.28268099, 2.37049510, 2.95419609, 3.50…

5a)

Create a scatter plot to show the relationship between the output and each input. Use the reshaped long-format data set to create facets for each input.

SOLUTION

ggplot(lfB,mapping = aes(x = value, y = y)) +
  geom_point() +
  facet_wrap(~name)

5b)

You visualized the output to input relationships, but you must also examine the relationship between the inputs! The cor() function can be used to calculate the correlation matrix between all columns in a data frame.

Correlation matrices are created from wide-format data and so you will use the original wide-format data, dfB, for this question instead of the long-format data, lfB.

Pipe dfB into select() and select all columns except the response y. Pipe the result to the cor() function and display the correlation matrix to screen.

SOLUTION

select(dfB,-y) %>% 
  cor()
##           x1        x2        x3        x4        x5        x6        x7
## x1 1.0000000 0.9100358 0.8325641 0.8625994 0.8706074 0.9176084 0.7733840
## x2 0.9100358 1.0000000 0.8563563 0.8754078 0.8964453 0.9164890 0.7798010
## x3 0.8325641 0.8563563 1.0000000 0.8207980 0.8225810 0.8381175 0.6932513
## x4 0.8625994 0.8754078 0.8207980 1.0000000 0.8114702 0.8668589 0.7536997
## x5 0.8706074 0.8964453 0.8225810 0.8114702 1.0000000 0.8677033 0.7503743
## x6 0.9176084 0.9164890 0.8381175 0.8668589 0.8677033 1.0000000 0.7545994
## x7 0.7733840 0.7798010 0.6932513 0.7536997 0.7503743 0.7545994 1.0000000
## x8 0.7661651 0.7683296 0.6959127 0.7647050 0.7307950 0.7661827 0.6671351
## x9 0.7623510 0.7709967 0.7050866 0.7632158 0.7416818 0.7680993 0.6617914
##           x8        x9
## x1 0.7661651 0.7623510
## x2 0.7683296 0.7709967
## x3 0.6959127 0.7050866
## x4 0.7647050 0.7632158
## x5 0.7307950 0.7416818
## x6 0.7661827 0.7680993
## x7 0.6671351 0.6617914
## x8 1.0000000 0.6664688
## x9 0.6664688 1.0000000

5c)

Rather than displaying the numbers of the correlation matrix, let’s create a correlation plot to visualize the correlation coefficient between each pair of inputs. The corrplot package provides the corrplot() function to easily create clean and simple correlation plots. You do not have to load the corrplot package, instead you will call the corrplot() function from corrplot using the :: operator. Thus, you will call the function as corrplot::corrplot().

The first argument to corrplot::corrplot() is a correlation matrix. You must therefore calculate the correlation matrix associated with a data frame and pass that matrix into corrplot::corrplot().

You will create two correlation plots. For the first, use the default input arguments to corrplot::corrplot(). For the second, assign the type argument to 'upper' to visualize the correlation plot as an upper-triangular matrix.

Based on your visualizations, are the inputs correlated?

SOLUTION

select(dfB,-y) %>% 
  cor() %>% 
  corrplot::corrplot()

select(dfB,-y) %>% 
  cor() %>% 
  corrplot::corrplot(type = 'upper')

The darker the color, the deeper the connection between the two coefficients. It can be seen from the figure that there are relatively high correlation coefficients between some sub-numbers, but most of the connections are not very close.

5d)

In lecture we discussed that we can easily create more features than inputs by considering interactions between the inputs! Thus, when exploring if feature correlation could impact model results, we should not simply examine the correlation between the inputs. We should also examine the correlation between the features derived from the inputs!

For this problem, let’s work with all pair wise interactions between the inputs. You should therefore create a model that has all main effects and all pair wise products between the inputs.

Define the design matrix that includes all main effects and pair wise produces between the inputs associated with dfB. Assign the result to the XBpairs objects.

We are focused on exploring the relationship between the features and so you must remove the intercept column of ones from this design matrix.

How many columns are in the design matrix?

SOLUTION

XBpairs <- model.matrix( y ~ (.)^2 - 1, dfB)
XBpairs %>% dim()
## [1] 110  45

They are 45 columns in the design matrix.

5e)

Create the correlation plot between all features in the XBpairs design matrix. Assign the type argument to 'upper' and set the method argument to 'square' in corrplot::corrplot() to visualize the correlation coefficients as colored square “tiles” within an upper triangular matrix.

HINT: Remember that the correlation matrix must be calculated first before calling corrplot::corrplot()!

SOLUTION

XBpairs %>% 
  cor() %>% 
  corrplot::corrplot(type = 'upper', method = 'square')

5f)

The corrplot::corrplot() function can reorder correlation plots to group correlated variables in clusters. This can help us “see” correlation structure easier compared to keeping the variables in their original order.

Create the correlation plot between all features in the XBpairs design matrix again. However, you should set the order argument to 'hclust' and the hclust.method argument to 'ward.D2'. Continue to set method to 'square' and type to 'upper'.

HINT: Remember that the correlation matrix must be calculated first before calling corrplot::corrplot()!

SOLUTION

XBpairs %>% 
  cor() %>% 
  corrplot::corrplot(type = 'upper', method = 'square',
                     order = 'hclust', hclust.method = 'ward.D2')

5g)

You will now use caret to tune the mixing fraction and regularization strength of the elastic net model. The caret library is loaded for you in the code chunk below.

library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift

You must specify the resampling scheme that caret will use to train, assess, and tune a model. You worked with caret in earlier assignments and there are several examples provided in lecture if you need additional help.

Specify the resampling scheme to be 5 fold with 5 repeats. Assign the result of the trainControl() function to the my_ctrl object. Specify the primary performance metric to be 'RMSE' and assign that to the my_metric object.

SOLUTION

my_ctrl <- trainControl(method = 'repeatedcv', number = 5, repeats = 5)

my_metric <- "RMSE"

5h)

You must train, assess, and tune an elastic model using the default caret tuning grid. In the caret::train() function you must use the formula interface to specify a model with all pair wise interactions to predict the response y. Assign the method argument to 'glmnet' and set the metric argument to my_metric. You must also instruct caret to standardize the features by setting the preProcess argument equal to c('center', 'scale'). Assign the trControl argument to the my_ctrl object.

Important: The caret::train() function works with the formula interface. Thus, even though you are using glmnet to fit the model, caret does NOT require you to organize the design matrix as required by glmnet! Thus, you do NOT need to remove the intercept when defining the formula to caret::train()!

Train, assess, and tune the glmnet elastic net model consisting of main effects and all pair wise interactions with the defined resampling scheme. Assign the result to the enet_default object and display the result to the screen.

Which tuning parameter combinations are considered to be the best?

Is the best set of tuning parameters more consistent with Lasso or Ridge regression?

Does the feature correlation seem to be impacting model behavior in this application?

SOLUTION

The random seed is set for you for reproducibility.

set.seed(1234)
enet_default <- train(y ~ (.)^2,
                      data = dfB,
                      method = 'glmnet',
                      metric = my_metric,
                      preProcess = c("center", "scale"),
                      trControl = my_ctrl)

enet_default
## glmnet 
## 
## 110 samples
##   9 predictor
## 
## Pre-processing: centered (45), scaled (45) 
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 87, 88, 87, 89, 89, 88, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda      RMSE      Rsquared   MAE     
##   0.10   0.01798082  4.921227  0.8038755  3.879746
##   0.10   0.17980820  4.364162  0.8387418  3.483446
##   0.10   1.79808197  4.977807  0.7785130  4.067109
##   0.55   0.01798082  4.752239  0.8156402  3.750969
##   0.55   0.17980820  4.207071  0.8493438  3.406831
##   0.55   1.79808197  5.370753  0.7453470  4.331699
##   1.00   0.01798082  4.677062  0.8195274  3.690341
##   1.00   0.17980820  4.260269  0.8434932  3.452308
##   1.00   1.79808197  5.537609  0.7496795  4.420238
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0.55 and lambda = 0.1798082.

RMSE was used to select the optimal model using the smallest value. The final values used for the model were alpha = 0.55 and lambda = 0.1798082. The best set of tuning parameters more consistent with Lasso regression. Feature correlation doesn’t seem to be impacting model behavior in this application.

5i)

Display the coefficients to the screen for the tuned elastic net model. Which coefficients are non-zero?

SOLUTION

coef( enet_default$finalModel, s = enet_default$bestTune$lambda )
## 46 x 1 sparse Matrix of class "dgCMatrix"
##                     s1
## (Intercept) -3.9787738
## x1           0.4709670
## x2           .        
## x3           .        
## x4          -1.0551157
## x5           .        
## x6           .        
## x7          -1.2677801
## x8           1.7497374
## x9          -0.6197409
## x1:x2        .        
## x1:x3        .        
## x1:x4        .        
## x1:x5        .        
## x1:x6        .        
## x1:x7        .        
## x1:x8       -1.4007389
## x1:x9        .        
## x2:x3        .        
## x2:x4        .        
## x2:x5        .        
## x2:x6        0.9468922
## x2:x7        .        
## x2:x8       -0.6740257
## x2:x9        1.1333283
## x3:x4        .        
## x3:x5        .        
## x3:x6        .        
## x3:x7        .        
## x3:x8       -0.6640331
## x3:x9        .        
## x4:x5        .        
## x4:x6        .        
## x4:x7       -1.8214194
## x4:x8        .        
## x4:x9        .        
## x5:x6        .        
## x5:x7        1.3474542
## x5:x8        .        
## x5:x9        .        
## x6:x7        .        
## x6:x8       -0.2226738
## x6:x9        1.6007788
## x7:x8       -5.2821782
## x7:x9        3.6603517
## x8:x9       -7.8244994

5j)

caret provides several useful helper functions for ranking the features based on their influence on the response. This is known as ranking the variable importances and the varImp() function will extract the variable importances from a model for you. Wrapping the varImp() result with plot() will create a figure showing the variable importances. By default, the displayed importance values are in a relative scale with 100 corresponding to the most important variable.

Plot the variable importances for the caret tuned elastic net model. Are the rankings consistent with the magnitude of the coefficients you printed to the screen in Problem 5ei)?

SOLUTION

plot(varImp(enet_default))

The two most important features are x8:x9 and x7:x8. The rankings consistent with the magnitude of the coefficients I printed to the screen in Problem 5i.

Problem 06

We have focused heavily on working with continuous inputs this semester. However, linear models may also use categorical inputs as features. This problem introduces working with categorical inputs by fitting non-Bayesian linear models with lm().

The data you will work with is loaded for you in the code chunk below.

dfC <- readr::read_csv('hw09_probC.csv', col_names = TRUE)
## Rows: 75 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): v
## dbl (2): x, y
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The glimpse shows there are two inputs, x and v, and one continuous response y. The v input is non-numeric.

dfC %>% glimpse()
## Rows: 75
## Columns: 3
## $ x <dbl> -1.65522340, -0.40027559, -0.80346451, 0.76686259, 0.93237497, 0.010…
## $ v <chr> "A", "A", "A", "B", "D", "D", "D", "A", "C", "D", "C", "A", "B", "A"…
## $ y <dbl> -1.0995699, -2.0743885, -2.2994641, 1.5491779, -0.1327855, -0.381488…

6a)

Create a bar chart in ggplot2 to show the counts associated with each unique value of the categorical variable v.

SOLUTION

ggplot(dfC,mapping = aes(x = v)) +
  geom_bar() 

6b)

Next, let’s focus on the relationship between the continuous output y and the continuous input x.

Create a scatter plot in ggplot2 to the show relationship between y and x.

Manually assign the marker size to 4.

SOLUTION

ggplot(dfC, mapping = aes(x = x, y = y)) +
  geom_point(size = 4) 

6c)

It is important to consider if the output to input relationship depends on the categorical variable When working with mixed categorical and continuous inputs. A simple way to explore such an effect is by coloring the markers in a scatter plot by the categorical variable.

Repeat the same scatter plot between y and x that you created in 6b) but this time map the color aesthetic to the v categorical variable.

Does the output to input relationship appear to vary across the levels (categories) of v?

HINT: Remember the importance of the the aes() function!

SOLUTION

ggplot(dfC,mapping = aes(x = x, y = y)) +
  geom_point(mapping = aes(color = v))

The output to input relationship appear to vary across the levels (categories) of v.For A, it’s a positive trend; for C, it’s a negative trend.

6d)

Sometimes we need to include smoothers in our plots to help guide our interpretation of the trends. The geom_smooth() function allows adding a smoothing layer on top of a scatter plot. You used this function earlier in the semester and will use it now to help your exploration.

Repeat the same scatter plot between y and x that you created in 6c). You should thus continue to map color to the v variable. This time however, add in a geom_smooth() layer. Assign the formula argument to y ~ x and assign the method argument to lm (you do not need quotes). You must also map the color aesthetic to v within the geom_smooth() layer.

Does the output to input relationship appear to vary across the levesl (categories) of v?

HINT: Remember the importance of the aes() function!

SOLUTION

ggplot(dfC, mapping = aes(x = x, y = y)) +
  geom_point(mapping = aes(color = v)) +
  geom_smooth(formula = y ~ x,
              method = lm,
              mapping = aes(color = v))

The output to input relationship appear to vary across the levesl (categories) of v For A and B, it’s a positive trend; for C and D, it’s a negative trend.

6e)

Let’s fit two simple models for this application. The first will use linear additive features. Thus, you should add the effect of the continuous input x to the categorical input v.

Fit a non-Bayesian linear model to predict the response y based on linear additive features between the two inputs. Assign the model object to the modC_add object and display the model summary() to the screen.

How many coefficients are estimated?

What does the Intercept correspond to?

SOLUTION

modC_add <- lm( y ~ x + v, dfC )
modC_add %>% summary()
## 
## Call:
## lm(formula = y ~ x + v, data = dfC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7640 -0.9464  0.0123  1.0740  3.0260 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -1.0249     0.3327  -3.081  0.00295 **
## x            -0.3562     0.2025  -1.759  0.08294 . 
## vB            1.7558     0.5479   3.204  0.00204 **
## vC           -0.2124     0.5088  -0.417  0.67769   
## vD            0.1857     0.4956   0.375  0.70897   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.585 on 70 degrees of freedom
## Multiple R-squared:  0.2075, Adjusted R-squared:  0.1622 
## F-statistic: 4.583 on 4 and 70 DF,  p-value: 0.002408

There are 5 coefficients estimated. The glimpse shows there are two inputs, x and v, and one continuous response y. The v input is non-numeric, and the Intercept is the value when x=0.

6f)

Next, let’s interact the categorical and continuous inputs!

Fit a non-Bayesian linear model to predict the response y based on interacting the categorical input v and the continuous input x. Your model should include all main effects and interaction features. Assign the model object to the modC_int object and display the model summary() to the screen.

How many coefficients are estimated?

What does the intercept correspond to?

SOLUTION

modC_int <- lm( y ~ x * v, data = dfC )

modC_int %>% summary()
## 
## Call:
## lm(formula = y ~ x * v, data = dfC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6171 -0.8263  0.1105  0.7751  1.8525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.4748     0.2394  -1.984 0.051404 .  
## x             1.0819     0.2740   3.948 0.000191 ***
## vB            1.4377     0.3829   3.755 0.000365 ***
## vC           -0.2113     0.3493  -0.605 0.547224    
## vD           -0.3786     0.3409  -1.111 0.270633    
## x:vB          0.1585     0.5043   0.314 0.754209    
## x:vC         -3.0104     0.3545  -8.492 3.13e-12 ***
## x:vD         -1.2546     0.3686  -3.403 0.001127 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.054 on 67 degrees of freedom
## Multiple R-squared:  0.6643, Adjusted R-squared:  0.6293 
## F-statistic: 18.94 on 7 and 67 DF,  p-value: 1.085e-13

There are 8 coefficients estimated. The glimpse shows there are two inputs, x and v, and one continuous response y. The v input is non-numeric, and the Intercept is the value when x=0.

6g)

Which of the two models, modC_add or modC_int, are better?

Is your result consistent with the figure you created in 6d?

SOLUTION

Add code chunks here and what do you think?

# compute AIC and BIC for each model
AIC(modC_add, modC_int)
##          df      AIC
## modC_add  6 288.7518
## modC_int  9 230.3214
BIC(modC_add, modC_int)
##          df      BIC
## modC_add  6 302.6567
## modC_int  9 251.1788

We can use the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) to compare the two models. These criteria balance model fit and complexity, penalizing models that overfit the data. The model with the lower AIC or BIC value is considered to be the better-fitting model. As we can see, modC_int is better. My result doesn’t consistent with the figure created in 6d.